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Abstract — In another related work, U-statistics were used for 
non-asymptotic "average-case" analysis of random compressed 
sensing matrices. In this companion paper the same analytical 
tool is adopted differently - here we perform non-asymptotic 
"worst-case" analysis. 

Simple union bounds are a natural choice for "worst-case" 
analyses, however their tightness is an issue (and questioned in 
previous works). Here we focus on a theoretical U-statistical 
result, which potentially allows us to prove that these union 
bounds are tight. To our knowledge, this kind of (powerful) result 
is completely new in the context of CS. This general result applies 
to a wide variety of parameters, and is related to (Stein-Chen) 
Poisson approximation. In this paper, we consider i) restricted 
isometrics, and ii) mutual coherence. For the bounded case, we 
show that fc-th order restricted isometry constants have tight 
union bounds, when the measurements m = C'(fc(l + log(n/fc))). 
Here we require the restricted isometrics to grow linearly in k, 
however we conjecture that this result can be improved to allow 
them to be fixed. Als o, we show t hat mutual coherence (with the 
standard estimate ^/{4Aogn)/m) have very tight union bounds. 
For coherence, the normalization complicates general discussion, 
and we consider only Gaussian and Bernoulli cases here. 

Index Terms — approximation, compressed sensing, satistics, 
random matrices 



I. Introduction 

Recovery analysis in compressed sensing (CS) is usually 
framed in the context matrix parameters. The restricted isom- 
etry constant is arguably the most commonly studied, as it 
obtains an important result, that proves that sparse signals with 
k components can be recovered from the order of k\og{n/k) 
number of samples from an n-dimensional ambient space UJ, 
llll . On the other hand a wide variety of parameters have been 
studied, each having its own desirable features (e.g., simplicity, 
accuracy, etc.). To list a few, we have mutual coherence [3]- 
m, Karush-Kuhn-Tucker (KKT) conditions for sparsity pattern 
recovery (involving matrix pseudoinverses) ||5]-|[8l, and the 
powerful null-space property |9]-|11|. 

To handle a wide variety of parameters, we would like 
a common framework that encompasses their common fea- 
tures. Here we consider random analysis, and in related 
work fm . we proposed how Hoeffding's U-statistics can be 
good analytical tool. This is due to a wide availability of 
general U-statistical theory, and most importantly the fact 
these statistics model a common feature shared by the CS 
parameters listed above - that is these parameters are defined 
combinatorially over all subsets of a fixed size. Furthermore 
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the tool applies well to the non-asymptotic regime, whereby 
important practical trends are the focus of recent works with 
similar random analysis themes ifTsl - lfTsl . or deterministic- 
type CS analysis fS], fE\, fW\, fTl]. There is however no 
discussion on U-statistics in CS literature. 

In IIT2I we show how U-statistics have a natural "average- 
case" interpretation, which we apply to so-called statistical 
restricted isometry property (StRIP) recovery guarantees [8|. 
In this work however, we show how U-statistics also apply 
well to "worst-case" analysis. Most CS analysis performed 
for random matrices are of the "worst-case" nature, whereby 
past seminal results established optimal rates and powerful 
recovery guarantees fll, O, lfT8l - ll20l . These analyses mostly 
involve taking union bounds over a large number of terms, 
whereby we are interested in an earlier posed question of 
Blanchard et. al. [19j: how tight are these union bounds? 

A result that can ascertain tightness of such union bounds 
could be potentially very useful, since these bounds consti- 
tute some of the simplest ways to analyze CS parameters. 
While past efforts to answer this question involve innovative 
bounding methods ||20| . and numerical explorations II2TI . 
here we discuss a U-statistical result that can answer this 
question by theoretical proof. This comes from (Stein-Chen) 
Poisson approximation |22|, ch. 2, which provides theoretical 
guarantees on bound tightness, and can be potentially used to 
show that the union bound cannot be drastically improved. In 
other words from a standpoint of characterizing the behavior 
of CS parameters for recovery analysis, it could be (possibly) 
shown that simple union bounds are sufficiently good and 
tight. To the best of our knowledge, this kind of powerful 
result has never been investigated before in the CS context. 
The U-statistical result is general, and applies to a wide variety 
of parameters. For brevity we only discuss two cases here, 
"worst-case" restricted isometrics, and mutual coherence - the 
more complicated null-space property left for future research. 

Contributions: We assume throughout that the matrix 
columns are independently sampled. We utilize an (non- 
asymptotic) approximation theorem that predicts how "worst- 
case" U-statistics can be well-approximated by a Poisson 
distribution (Theorem 2.N). A good Poisson approximation 
implies that union bound analyses will essentially be suffi- 
ciently tight, where theoretical approximation error bounds 
can be given. These error bounds require second-order joint 
probabilities. Denote n and m to be block and measurement 
sizes, respectively. We consider two cases i) restricted isome- 
trics and ii) mutual coherence. For i) empirical studies suggest 
good approximation 11271 . Here Ahlsewede-Winter techniques 
are used to obtain the necessary joint probabiUties. These 
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techniques lead to simplified arguments, on the other hand 
there are certain weaknesses that lead to a sub-optimal rate. 
Nevertheless for bounded matrix entries and m in the order of 
fc • (1 + \og{n/k)), the approximation error can be shown to 
exponentially decay in m, but at the same time requiring the 
restricted isometry constants to grow linearly in k (Theorem 
[TJ. We conjecture that this result can be improved. For ii), 
we show that when the mutual coherence is on the order of 
a/ (41og7i)/TO (the standard estimate, see |7|), the Poisson 
approximation error exponentially decays in m (Theorem |2|. 
For simplicity for ii) we only consider Gaussian and Bernoulli 
cases. 

Organization: We begin with relevant background on CS 
in Section HIl In Section |lll] we present a In Section |IV] 
we derive theoretical bounds related to "worst-case" Poission 
approximation, for both the restricted isometries, and mutual 
coherence cases. We conclude in Section IVl 

Notation: The set of real numbers is denoted R. Determin- 
istic quantities are denoted using a, a, or A, where bold fonts 
denote vectors (i.e., a) or matrices (i.e., A). Random quantities 
are denoted using upper-case italics, where A is a random 
variable (RV), and A a random vector/matrix. Let Pr{yl < a} 
denote the probability that event {A < a} occurs. Sets are 
denoted using braces, e.g., {1,2, •••}. The notation i,j,£,uj 
is for indexing. The notation E denotes expectation. We let 
II • lip denote the £p-norm for p = 1 and 2. 

II. Preliminaries 
A. Compressed Sensing ( CS) Theory 

A vector a is said to be fc-sparse, if at most k vector coef- 
ficients are non-zero {i.e., its ^o-distance satisfies ||a||o < k). 
Let n be a positive integer that denotes block length, and let 
a — [ai,a2, ■ ■ ■ ,ari]^ denote a length-n signal vector with 
signal coefficients a^. The best k-term approximation oik of cx, 
is obtained by finding the A:-sparse vector oik that has minimal 
approximation error ||afc — Q!||2- 

Let $ denote an m x n CS sampling matrix, where 
m < n. The length-m measurement vector denoted b — 
[5i,62,--- ,&m]"^ of some length-n signal a, is formed as 
b — $a. Recovering a from b is challenging as $ possesses 
a non-trivial null-space. We typically recover a by solving the 
(convex) -minimization problem 

min ||a||i s. t. | |b - *q;| I2 < e. (1) 

The vector b is a noisy version of the original measurements b, 
here e bounds the noise error, i.e., e > ||b — b||2. Recovery 
conditions have been considered in many flavors, e.g., f\\- 
ifm . mostly by studying parameters of sampling matrix 

For k < n, the fc-th restricted isometry constant 5^ of an 
m X n matrix equals the smallest constant that satisfies 

il~-Sk)\\a\\l<\\^a\\l<il + 6k)\\a\\l (2) 

for any fc-sparse a in R". The following well-known recovery 
guarantee is stated with respect to Sk in ©. 

Tlieorem A, c.f., II23I Let $ be the sensing matrix. Let a 
denote the signal vector. Let b be the measurements, i.e., b = 



$a. Assume that the {2k)-th restricted isometry constant 52k 
of^ satisfies 62k < 1, and further assume that the noisy 

version b 0/ b satisfies ||b — b||2 < e. Let oik denote the 
best-k approximation to a. Then the li-minimum solution a* 
to (O satisfies 

||a* - a||i < Cilia - Sfelli + C2e, 

for small constants ci = 4^1 + 52k/ (1 — i52fe(l + V^)) and 
C2 = 2(52fe(l - V2) - l)/(<52fc(l + 72) - 1). 

Theorem A is very powerful, on condition that we know the 
constants 5k. But because of their combinatoric nature, com- 
puting the restricted isometry constants 5k is NP-Hard |fT9l . 
The computational difficulty can be seen as follows. Let 
cr^ax(A) and cr^ij,(A) respectively denote the maximum and 
minimum, squared-singular values of matrix A. Denote a 
function C : R™^'^ — ^ R, where for any A e R™'*'^ 

C(A) = max(a2^,(A) - 1, 1 - ^^^^(A)). (3) 

Let S denote a size-fc subset of indices. Let $5 denote the 
size mxk submatrix of indexed on (column indices) in S. 
We then see from ^ that if the columns 4>i of $ are properly 
normalized, i.e., if \\4>i\\2 = 1, we deduce that 5k satisfies 

5k = maxC($5), (4) 

where the maximization is taken over all (^) size-fc index sub- 
sets S. For large n, the number (^) is huge. To overcome this 
issue, we may avoid explicitly computing 5k by incorporating 
randomization. Let A denote a random matrix of size m x n. 
Suppose we sample ^ ~ A. Let Ag denote the size mxk 
submatrix of A, indexed on S. As the mappings CTmax(') and 
o'min(') corresponding to singular values are 1-Lipschitz, we 
have the following well-known measure concentration resulfl 

Theorem B, c.f., (T5\ p. 24, (lU p. 18 Assut^ie A^ IS an 
mxk random matrix where k < m, and assume that the 
entries {As)ij of Ag are both IID with zero mean, i.e., 
'L{As)ij — 0. Let every {As)ij be either i) Gaussian with 
variance 1, or ii) symmetric Bernoulli variables in {—1,1}. 
For every e > 0, we have the probability inequalities 

PT{a-mm{As) < ^/m — Vk — e} < cxp(— e^/ci), and 
Pi{a^;^{As) > Vm + y/k + e} < cxp(-eVci), 

where ci = 2 in the Gaussian case, and ci = 16 in the 
Bernoulli case. 

Let A have a measure described in Theorem B, then a union 
bound over (^') size-fc subsets leads to the following conclu- 
sion. Let IL{ } denote an indicator function. The probability of 
sampling $ = (l/y/rn) ■ A, such that the restricted isometry 
constant of $ exceeds 5, is upper bounded as 

Y,HaAs)>Sm-^}>Q\<2[-) e- 2 , 

(5) 

'For simplicity, we omitted small deviation constants in Theorem B, 
see 1 24 1 p. 18 for details. 
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where constant e{k/m, S) only depend^ on the ratio k/m, and 
constant S. Then if mc > fc(l+log(n//c)), for some constant c 
at most £^{k/m, 6)/2, the RHS of (|5]) vanishes with increasing 
m. Then, one claims $ has restricted isometry constant at most 
S with "large probability". Note, $ = {l/y/m) ■ A does not 
guarantee the normalization ||0i||2 — 1 in the Gaussian case, 
but for simphcity this is usually ignored, see |[l], ifTSl . 

Recovery guarantee Theorem A involves "worst-case" anal- 
ysis. Seen from union bound (|5]l, if any one submatrix As 
satisfies ({Ag) > Sm^, the whole matrix A is deemed to 
have restricted isometry constant strictly larger than 6. Still, 
such union bounds are conceptually simple and thus com- 
monly employed in "worst-case" CS analysis. They are very 
useful, whereby in past seminal works they established optimal 
compression rates and powerful recovery guarantees |2i, 
lfT8l - ll20l . Hence, it is our interest to present a result that 
addresses the tightness of union bounds analyses. The question 
of bound tightness has already been investigated in past 
works II20I . ET\ . However this work stands apart, by utilizing 
a mathematical apparatus that is able to theoretically calculate 
the tightness of such union bounds. Such a result would have 
important theoretical implications, and may potentially void 
the need for empirical studies and ad-hoc methods. To our 
knowledge, such a result has never been discussed before in 
the context of CS. 

The said apparatus is called U-statistics, whose concept 
is introduced in the next subsection. U-statistical theory is 
very well-studied, and related work llT2l discusses a different 
application to "average-case" recovery guarantee^ 

B. U-statistics 

U-statistics were invented in the late 40's by Hoeffding as a 
theory for non-parametric testing |25 |. A function C, : R™^*^ — > 
IR is said to be a kernel, if for any A, A' e R™^'', we have 
C(A) = C(-^') if matrix A' can be obtained from A by 
column reordering. U-statistics are associated with functions 
g : R^x*-' X R {0,1} known as indicator kernels. In 
this paper we only consider indicator kernels g of the form 
g{A,a) = 1{C(A) > a}. Examples of indicators can be 
constructed with C equals (|3]l, as well as C = cr^^^ and <J^j„. 
The following definition slight differs from that of lfT2l . 

Definition 1 (Indicator Kernel U-Statistics). Let A be a 
random matrix with n columns. Let $ be sampled as ^ ^ A. 
Let g : R™^'^ x R 1— >■ {0,1} be a indicator kernel. For any 
a e R, the following quantity 

f^«(«)^7^E5(*5,«) (6) 
\k) s 

is a U-statistic of the sampled realization ^ = A, correspond- 
ing to the kernel g. In (|6|, the matrix $5 is the submatrix of 

^More specifically, constant e{k/m,S) needs to be set greater than 
max(Vl + S — (1 + y^k/m), 1 — y^k/m — y'l — 5). Note that S must 
resuk in the latter quantity being positive. 

^While 1 12 1 (and references therein) proposed certain benefits of "average- 
case" analysis, we emphasize that "worst-case" analyses are useful for impor- 
tant reasons stated in the text. Also, not forgetting that certain applications 
may strictly require guarantees in the "worst-case" sense. 
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$ indexed on column indices in S, and the sum takes place 
over all subsets S in {1, 2, • • • , n}. Note, < Un{a) < 1. 

III. POISSON APPROXIMATION THEOREM: "WORST-CASE" 
BEHAVIOR 

This section states the U-statistic result, that allows us to 
compute the tightness of union bounds used in "worst-case" 
analysis. For illustration here we use restricted isometrics. 

One way of putting resticted isometrics (|4]l in the U- 
statistical context, is to set as in (|2]l and consider the event 
{Un{a) = 0} that occurs when C(^s) never exceeds a, i.e., 

Pr{;7„(a) = 0} = Pr{maxC(A5) < a}. (7) 

iS 

However, here we follow lfT2l and consider both maximum and 
minimum squared eigenvalues in separately. For C, = cr^ax 
and C = (T^jj, respectively, we consider the complementary 
events (of the LHS of ©) 

Pr{{/„(a) > 0} = Pr{max (72^,(^5) > a}, 

s 

Pr{[/„(a)>0} = Pr{max-aL(A5) > -a}, (8) 

For a = 6k, these two events respectively correspond to 
violation of the upper, and lower, inequalities of (|2]l. Observe 
how these bounds are similar to the previous union bound (|5]l. 

Techniques developed for estimating Pr{[/„(a) — 0}, see 
(|7]i, fall under the umbrella term Poission approximation, 
see II22I . Il26l . Il27l . The terminology comes from similarities 
with the Poisson limit of a binomial distribution, see [22], ch. 
1. To illustrate the last point, consider the special case k — 1, 
and let denote the i-th column of Then C/„(a) equals the 
average Un{a) = - X]"=i where we consider subsets 

S of size-1 of the form S — {i}, i.e., $5 = <j)i. Suppose we 
sample $ = A where random matrix A has n IID columns Ai. 
Then for any a, we have g{Ai,a) to be Bernoulli distributed 
with probability p{a), recall p{a) — Eg{Ai,a) — Pr{C,{Ai) > 
a}. Furthermore C/„(a) has the binomial distribution, which 
is well approximated by the expression (rip(a))^e~"P'^°^/(j!) 
for small probability p{a) and index j, see [22 1, ch. 1. The 
previous expression is nothing but the Poisson distribution 
function with parameter np{a). 

For k = 1, Poisson approximation naturally holds for U- 
statistics, but the extension to general fc > 1 is non-trivial. Let 
both S and TZ denote index subsets of size k. Define 

q,ia) - FT{aAs)>a,CiAn)>a} (9) 

for 1 < i < A: and subsets S,TZ where |5 n 7?.| = i. Let 
A„(a) denote the sum of all tail probabilities, i.e., let A„(a) = 

Theorem 2.N, c.f., ll22ll . see p. 35 Let A be an m x ran- 
dom matrix, whereby the columns Ai are IID. Let C^be a kernel 
that maps R'"^'^ — > R. Let g be an indicator kernel that maps 
ipmxfc X R ^ {0,1} that satisfies g{A,a) = I1{C(A) > a}, 
and let p{a) — 'Lg{As,a) — E[/„(a)- Let Un{a) be a U- 
statistic of sampled realization ^ — A corresponding to 
indicator kernel g. For all 1 < i < k, let qi (a) be defined as in 
Let A„(a) = (^)p(a). For some a € R whereby p{a) > 0, 



LIM AND STOJANOVIC: ON U-STATISTICS AND COMPRESSED SENSING II: NON-ASYMPTOTIC WORST-CASE ANALYSIS 



4 



k = 2 m = 5 




10 1 2 3 4 5 6 7 

Max. Sq. Singular Value a 




0.01 0.02 0.03 0.04 0.06 

Min. Sq. Singular Value a 

Fig. 1. Gaussian measure. Empirical tail probability Prjmax^ C(-As) > a} 
is shown, where f = c^ax ^"d f = — c^jj^ in (b), respectively 

corresponding to the the maximum and minimum squared singular values. 

the probability Pr{[/„(a) = 0} = Pr{max5 < a} is 

approximated by the function exp(— A„(a)) as follows 



Pr{niaxC(A5) < a] 

s 



exp(-A„(a)) < e„(a) 



where the approximation error e„(a) is given as 



fc-i 

,T I \ k — r 



n — k 
k 



Ef^V" %{a)-'-qr{a)}. (10) 



In the sequel, Theorem 2.N will lead to calculating the tight- 
ness of union bounds. The proof uses Stein-Chen techniques 
and is rather lengthly, thus we refer the reader to 1221 . 
ch. 2. Similar to Theorem 1 presented in lfT2l . Theorem 
2.N also requires IID columns. The quantities dHJ of interest 
is approximated by the function 1 — exp(— A„(a)) up to error 
e„(a) in dTol l. Note that Theorem 2.N is a non-asymptotic 
result because of explicit dependence on system sizes k, m, n. 

First, we illustrate Theorem 2.N using some simulation 
results. We draw size 5 x n random matrices A, where the 
columns Ai are drawn IID. For C, = cr^ax' Figure [TJa) shows 
the tail distribution Pr{max5 cr^ax(-^5) > '^}- This is obtained 
by empirical simulation, performed for k ~ 2 and two block 
lengths n = 10 and n = 25. Figure [TJa) reveals reasonably 
good approximation for all shown values for a (compared to 
the function 1 — e'^"'^")), within a factor of 2-4. The approxima- 
tion is observed to improve in the higher part of the tails {i.e. 
for larger values of a). For ( = —(J^i„, Figure \llb) presents 
similar empirical comparisons for Prjming a^^^^{As) < a} in 
(O. In this case we notice better approximation, the differences 



become hardly noticeable. The extremely small k, m, n values 
chosen in this experiment suggest Theorem 2.N works well for 
non-asymptotics. 

The quantity A„ (a) defined above, is in fact a tail probability 
union bound. Notice that Pr{C/„(a) > 0} < A„(a), see (|5]l, 
and we display A„ (a) in Figure [T] We claim that Theorem 
2.N can in fact be used to evaluate of the tightness of the union 
bound, and shows us (if at all) how much the bound can be 
improved. This is because even though the error ( fTOl i is given 
w.r.t. 1 — e~^"(''' and not A„(a), note that these functions are 
close for the region of interest, i.e., 1 — e"^"^"^ = A„(a) + 
o(A„(a)) for A„(a) 0. 

Union bound analyses only make sense when A„(a) is 
small. The first term [(^) - {'^^'')]-p{a) from ([TOll must also be 
small, since is at most A„(a). We are mostly concerned with 
the second term, which depends on the joint tail probabihties 
qi{a). Using the fact qk-i{a) > qi{a) for i < k — 1, see ['261 
Lemma 1, we further derive another useful form of (fTOl l 



enia) < (l-e-^"('^))|p(a) 
'e(2fc- 1 



+ 2'' ■ 



k-1 



n ^ k 
k 

2k-l 



2k ~ 1 



<lk-i{a), 
(11) 



which only depends on a single joint term qk-i{a). The exact 
details of the derivation is given in Appendix lAl The constant 
term in front of qk-i{a) has exponent at most {2k — 1) • [1 + 
log(n/(fc— 1))]. This easily follows by the bound (2fc— l)/(fc— 
1) = 2 + l/(fc — 1) > 2, and simple arithmetic. Hence as 
the exponent of (^') (in front of p{a)) in A„(a) is at most 
+log(n/fc)), the error ( fTTT i will be small if we can ensure 
that the joint probability g^-i (a) drops "twice as fast" as p{a). 

In the next section, we evaluate the (non-asymptotic) ap- 
proximation error e„(a) given by Theorem 2.N, for two 
important CS parameters taken from well-studied "worst- 
case" analyses: restricted isometrics (see Subsection IIV-AI ) 
and mutual coherence (see Subsection lIV-Bb . The main theo- 
rems/conclusions will be given for both cases. We leave the 
more complicated null-space property for future work. 

IV. PoissoN Approximation Error & 
Non-asymptotic Probability Estimates 

A. Restricted isometries case 

We focus on the case of restricted isometries, i.e. when 
C is set to equal cr^ax ™d ^^min respectively, as in (|8]l. 
Estimates for the joint tail probabilities qi{a), are not as well- 
addressed as the marginals Pr{C(A5) > a} (denoted p{a)). 
The following proposition presents such estimates. Also for 
any two Bernoulli distributions with probabilities a and 6, 
let 'D{a\\b) denote the binary information divergence, i.e. 
V{a\\b) ^ alog(a/6) + (l-a)log((l-a)/(l-6)). Note that 
log here indicates natural log. For any matrix A with entries 
Uij, the i-th row outer product of A equals the kx k matrix 
with entries an ■ a^^. 

Proposition 1. Let A be an m x n random matrix, whereby 
the columns Ai of A are identically distributed. Let every 
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entry Aij of A, satisfy the bound \Aij\ < \/^/rn, such that 
the columns Ai are normalized as \\Ai\\2 < 1. Let the rows 
[Aii,Ai2,--- ,A,n] ofAbellD. 

Let S, TZ be size-k index subsets, whereby S and TZ intersect 
in exactly i positions, i.e., \SmZ\ = i. Let X andY equal the 
first row outer products of the matrices y/^A^ and ^J^A-ji, 
respectively. Let Tq denote a constant that satisfies 



appearing in ( fT3T l. The counterpart result for the marginal case 
is as follows. 



r,, • Tr(C)Tr(D) > ETr(CX)Tr(Dy), 



(12) 



where C,D can be any positive semidefinite matrices of size 
kxk, and Tr(-) denotes trace. Also define constants Tp max ond 



Theorem C, c.f., Thm. 5.1, ||29| Let the assumptions on 
matrix A, index subset S, and matrix X be the same as 
TheoremU] Let Tp^^^^ = i,^.^y^{E.X) and Tp^^i^ = ^(E^)- 
Let 'D{-\\-) denote binary information divergence. Then the 
following tail probability bounds hold 

Pr{al,{As) < a} < fce-™-^(tlk..min), (17) 



as follows Tp max — ^^max(E-X^) ^wt/ Tp min — ^min(E-^). for respective limits k ■ Tp.max < a < k and < a < k ■ Tp 



' p. mm 

where <;max cind <;min denote maximum and minimum eigenval- 
ues, respectively. 

Assume max (rp^ max, Tp,mm) < Let 'D{-\\-) denote 

binary information divergence. For all 1 < i < k such that 
\S CiTZl = i, the joint tail probability bounds ( liil ) (see page 
bottom) hold for k ■ Tp^max < a < k and < a < k ■ Tp^mm 
respectively, and where the constant C3 = 03(0,^,01,02) 
satisfies 



03(0, fc, Ci,C2) = - log 



C4 



log C2(l+C4)2 + (1-C2) 



and the constant a = c^i^a, k, ci, C2) satisfies 




04(0, fc,Ci,C2) 




4(c2-^-l)(l-f)f 1 



2.(15) 



(l-ci)2 

Proposition [T] provides upper bounds on qi{a) or 
Y'i{C,{As) > a, ({-All) > o,}, for both cases ( ~ (Tmax 
C = ^fmin- This result requires an estimate for the constant 
Tq in ( fT2b . While Proposition [T] does not assume independent 
columns Ai, however Theorem 2.N does. Under this additional 
column independence assumption, we claim that we can take 

/3 /3' 

— and '7"p,max — '^p,mm — ~^ ' (1^) 

for some constants /?,/?'. The latter claim is easily verified to 
be true, whereby in this case EX (see Proposition [T]! equals 
an identity matrix scaled by some specifically /?' = 

m-{EAijf. If Aij are BernoulH {-l/^m, 1/V^} then 13' = 
1 and we claim /3 = 3. The former will be clarified in the 
upcoming Proposition |2] in this subsection. Note, to meet the 
Tq condition in above Proposition [T] 



ma,x(Tp max, '^p, mill) ^ 

we require /?' < in ( fT6l l: this is satisfied in the Bernoulli 
case. The proof of Proposition [T| uses a technique called the 
Ahlswede-Winter method |28|, that results the factor of fc^ 



The exponent in the bounds for the joint case (fTSl l. seem to be 
twice that of the marginal case ( fTTT i. This would be true if the 
constant C3 in (fTJI i and ( fT4] i is small, and if Tg/rp max ~ T'p.max 
and Tq/Tp. mill ~ T'p.min (the latter two conditions are true if 
/3 w /?' in ( fTSI l above). Recall that having ( fTsT l drop twice 
as fast as (T7\ is excellent from the standpoint of achieving a 
small Poisson approximation error en{a). Figure |2l^a) seems to 
suggest that the exponent of ( fT3] l becomes double that of ( fTTl ). 
Here we plot the exponents within the cxp( ) terms in both 
(fTSl l and ([TtI i. where the exponent of (fTSl l is "halved" for easier 
comparison (meaning that it is the exponent after factoring out 
—2m, where for ([TtI i we only factor out — m). The af^^^ and 
(Jmin cases are respectively shown for different a values in 
the ranges a > 1 and a < 1, according to the different given 



expressions for the ranges of a (note k ■ Tp 



k • Tr, 



1). 



As it becomes more apparent as k increases from 4 to 20, the 
plotted ("halved") exponents of ( fTSl l is close to that of ( [Tt] ). 

The previous discussion is only aimed at developing in- 
tuition, and is not a proof of any sort. We now evaluate 
the constant C3 more carefully. For the C, = (Tmax case (the 
C = —cr^in case follows similarly), we notice the following 
from the claim (fTSl i: i) ci is proportional to 1/k, where 
ci = Tq/Tp^i^^^, and ii) C2 is constant, where C2 = Tp max/^"?- 
In particular for the Bernoulli case we will have ci = 3/fc 
and C2 = 1/3. Under i) and ii) we claim that for large fc, 
the constant C3 is approximately a/k (omitting a constant 
factor). This implies that C3 is gets smaller for fixed a 
and increasing fc, supporting the discussion in the previous 
paragraph. To show this, we first argue that (omitting constant 
factors) C4 is also approximately a/k. From (fTSl ). by the Taylor 
approximation ^/l + a = 1 + l/2a + o(a) for |a| < 1, we 
have C4 = a/4 + 0(0:) where here 



, /34-(l-f)f /a 
a (\ 



1 + ci 



(18) 



Pr{ai,(*5) > a,<TL,($7z) > a} < A;^ cxp -m • 2 2? - 



' p,max 



+ C3 a, fc, 



' p,max 



Pr{aL(*5) < a, ^^^^(^tj) < a} < fc^ exp -m • 2 2? 



' p,niin 



+ C3 a, fc, 



(13) 



' p,min 
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Fig. 2. Bernoulli measure. In (a), exponents of the joint and marginal tail 
probabihties )13t and )171 are plotted with respect to various a values. In (b), 
the same exponents are shown with respect to various k (whereby a is fixed). 

for /34 — 4(c^^ — 1), and ii) implies (3^ to be some positive 
constant. Thus C4 = (/?4/4) • {a/k) + o(l/fc). Moving on to 
(O, the above discussion shows that (04 + a/k) /{a/k) = 
/34/4 + o(l), and 



(1 + C4)^ 



1- ^ 



1 + ci - 2 



a 
k 



where the final identity follows similarly as in (fTSl l. Then C3 = 

/?3 • (a/fc) + o(l/fc), where /Sa = + log((/?4/4) + o(l)) for 
constants Pa, (3 . Hence the claim that C3 drops reciprocally in 
k follows. 

Unfortunately under our assumptions, the above argument 
that C3 is small, is insufficient to show that for fixed a the 
quantity qk-i{a) drops twice as fast as p{a). This is because 
one can show (similarly as we did above) that 2?(a/fc||ci) 
also drops reciprocally in k, whenever ci is proportional to 
1/k. That is for fixed a, the exponents of both joint and 
marginal bounds ( fTsT l and ( fTTI i get smaller as k increases, 
as illustrated in Figure ^b). This figure plots essentially the 
same exponents shown in Figure ^a), but here a is fixed 
to two values (o = 3/2 for cr^^^ and a = 1/2 for cr^in), 
and the horizontal axis is now w.r.t. k. We see that as the 
exponents shown drop at an approximate rate of 1/fc with 
increasing k. While the techniques behind Theorem C (and 
also Proposition [Hi are simple, as pointed out in ||29ll that, 
they do worse than Theorem B if the columns Ai are IID 
(which we assume in Theorem 2.N). Nevertheless we can 
show that the Poisson approximation error e„(a) in (fTTI) . 
drops if we allow a to grow. As mentioned before, the main 



concern is the exponent of the second term in (fTTI) . where 
the constant term in front of qk-i{a) has exponent (at most) 
(2A; — 1) • [1 + log(n/(fc— 1))]. Taking the exponent of qk-i{a) 
(w.rt. to —2m) as 'D{a/k\\ci) + C3. Approximate as before 
I?(a/A:||ci) = /?u • {a/k) + 0{l/k) for some constant /3x), 
also as before — ^3 ■ {a/k) + o(l/fc), hence we requir^ 



/3 



f- 

\k 



> 



k-l 
2 



1 + log 



1 



where /3 = (S-d + /^s- Simply taking k — 1/2 and fc — 1 as 
k (assuming moderately large fc), we essentially proved the 
following main result of this subsection. 

Theorem 1. Assume that the columns Ai of A are IID. 

Consider the error e„ (a) in UOi for the restricted isometrics 
case, i.e., C = cr^^^ or C, — — Cj^in- Assume the terms Tq, Tp^max 
and Tp. niin f<2s defined in Proposition Q]) satisfy ( I-/6D . Let 
m — ti ■ k{l + log((n/fc))) for some constant ti. Then the 
error e„(a) in ( liOD will exponentially drop in m, if we set 
a ~ (^2//^) ■ k, where constants t2,(3 satisfying ^2 < <^nd 
P = Pt) + i^a with Pxi ond P3 corresponding to respective 
approximations of the above terms I?(a//c||ci) and C3, and 
where k is sufficiently large. 

Explicit constants could be obtained by more careful book- 
keeping, but not done here for brevity considerations. Note we 
require k to be sufficiently large to allow to previous 0{l/k) 
term to become small enough (i.e., to allow t2 + 0{l/k) < 
t^^). Incidentally, recall that previous Figure |2ja) showed 
joint ("halved") and marginal exponents for as k increases. 

We conjecture that we should be able to improve Theorem 
[T]to hold for fixed a - we leave this to future research. This 
conjecture is inspired by recent work |2]J, whereby for fixed 
values of a that satisfy recovery guarantees (similar to The- 
orem A), see equation (17), ll2T]| . experimental validation of 
theoretical union bounds presented in [19,1 (similar to (|5])) has 
been performed. The reader is referred to II2TI for these results, 
performed for an undersampling ratio m/n = 1/4, and a wide 
range of values to 250 - 2000 and n = 1000 - 8000. 
Also to support this conjecture for smaller to = 50, in 
Supplementary Material ?? we present experimental results. 

The rest of the subsection discusses the proof tech- 
niques. Although Theorem [T] looks significantly more com- 
plicated than Theorem C, the proof techniques that follow 
the Ahlswede- Winter method are very similar. To facilitate 
the proof of Proposition [U we first present the proof for 
Theorem C. Our proof is slightly different (and simpler) 
than that in ll29l . due to the fact that we made a further 
simplifying assuinption that the rows of A are IID; in ||29l 
the independence^ assumption also holds but the identical 
assumption is not necessary. 

Consider the m x k submatrix A^ of A. Express the 
product A'^Ag as an average of to random matrices Xi, i.e. 
express A^Ag — X^i" 1 whereby Xi is the i-th row 
outer product of to • Ag. Clearly if \S\ = 1, then each Xi 
becomes a scalar RV and ^4^^^ becomes simply an average 

"^We ignored a log k term (due to k^ in \13\ ) on the RHS. 
^Like most results that require independence, there is a natural generaliza- 
tion to martingales, e.g. see 1291 . 
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of scalar RVs. The Ahlswede-Winter method is essentially a 
concentration result for sums of random matrices. Let Sm 
denote the matrix sum that satisfies S„i — Y^^iLi-^i- Write 
AgAs = ^Srn and express similarly as in ^ 

Pr{al^^{As) > a} = PrlwlSm) > m • a}, and 

^'^WminiAs) < a} = Pr{^inax(-'S'„i) > -TO • a}, 

so that it suffices to only look at the maximal eigenvalue 
function <;jnax- That is we will only need to treat the quan- 
tities irmax('S'„i) and Qnaxi^Sm). Foi a real, symmetric matrix 
A, let is denote the matrix- exponential that satisfies 
= X^i^o h-^^- eigenvalue of A, then e*" is 

an eigenvalue of e^. By convexity of the function e", the 
inequality e'*" < 1 + (e'' - 1) • a holds for all /i e IR and 
< a < 1. Let I denote the identity matrix. For any real, 
symmetric matrix A whereby <^max(A) < 1, the properties of 
the matrix exponential and the inequality e'*" < 1 + (e'* — 1) • a 
imply that for any /i S IR 



I + (e'* - 1)A — e^^ is positive semidefinite 



(19) 



Because Sm — Sr=i ™^ ™^ outer sums, therefore 

Sm is positive semidefinite. For any real, symmetric matrix 
A, the matrix exponential is clearly positive semidefinite. 
Also for a positive semidefinite matrix A, we have Tr(A) > 
'?max(A). For any h,t > 0, we have 



Pr{Tmax(S'm) > t} 



< Pr{Tr(e''^") > e'*'} <e-'**ETr(e 



-,hS„ 



(20) 



where the first inequality follows because e'*^" is positive 
semidefinite, and the second inequality follows from Markov's 
inequality. Similarly, for any h,t > we also have 

Pr{w(-5,„) > -t} < e''*ETr(e-''«-)- 

The proof of Theorem C relies on the following lemma, shown 
using the fact ( fT9l l. 

Lemma 1. Let X be a random, positive semidefinite matrix 
that satisfies (^m^a. {X) < 1. Let C be any positive semidefinite 
matrix of the same size as X. Then for any h > Q we have 
the following inequalities 

Tr(Ce''^) < Tr(C) + (e''-l)Tr(CX), 
Tr(Ce''^) < Tr(C) + (e-'' - l)Tr(CX). (21) 

Taking expectation, we also have 

E{Tr(Ce''^)} < Tr(C) (e'Vp,„,, + (1 - Tp,„,,)) 
E{Tr(Ce-'^^)} < Tr(C) (e-'Vp,,,„ + (1 - Tp,„„)) (22) 

where Tp,n,ax = <;^niax(OC) and Tp^^^in = ^n,in(E^)- 



The proof of Lemma [T] is relegated to Appendix |B] To 
show Theorem C we also need the Golden-Thompson in- 
equality 1,28 J . The Golden-Thompson inequality states that for 
any two real and symmetric matrices A and B, we have 
Tr(e'^+^) < Tr(e''^e^). The proof of Theorem C is also 
furnished in Appendix IB] 

Proposition [T] for the joint case is similarly proved using 
Lemma[T] Consider two size-fc subsets S and TZ, that intersect 
in exactly i positions, i.e., \Sr\TZ\ = i. In addition to 5m, sim- 
ilarly define another matrix Tm that satisfies Tm = X^illi 
where each Yj is a size kxk matrix. Also similar to Xi, let Yi 
equal the j-th row outer product of the matrix ^ • A-pi. Recall 
the joint Markov inequality, where for two RVs A and B and 
for any ti,t2 > 0, we have Pt{A > ti,B > ta} < j^- 
Applying similar reasonings as in (|20] | we get for /ii, /12 > 

(Tm) > t} 

< Pr{Tr(e''^^'") > e''^*, Tr(e''^^") > e''"*} 

< e-*(''i+'^^) . E{Tr(e'^i^")Tr(e''^^'")}, and 

Pr{?max(~'S'm) > ^max(~T'm) > 

< e*(''i+'*=) • E{Tr(e-''i^'")Tr(e-''^'''")}. (23) 

Let X,Y denote a random, positive semidefinite matrices of 
the equal size. For any positive semidefinite matrices C,D 
same size as X, apply (|2TI) in Lemma [T] and use similar 
arguments that appear in its proof (see Appendix iBli to show 
for /ii , /i2 > that ( l24b (see page bottom) holds, where the 
constant Tg satisfies (fT2l i. i.e., satisfies Tq ■ Tr(C)Tr(D) > 
ETr(CX)Tr(DY"). We are now ready to prove Proposition 
[U given in detail in the Appendix IbI 

To finish up this subsection, we address how to compute 
a constant Tq that satisfies the hypothesis (fT2b required in 
Proposition [T] 

Proposition 2. Let X be an outer product of the row 
[Ai, A2, • • ■ , A^.] of k RVs Ai. Let Y be an outer product 
of the row [Bi, B2, • ■ ■ , B^] of k RVs Bi. For some positive 
integer c < k, assume i) Ai — Bi for i < c, ii) the RVs Ai are 
IID, and Hi) the RVs Bi are IID. Let Xij and Yij denote the 
matrix entries ofX andY, respectively; note that Xij — AiAj 
and Yij = BiBj. Assume 'LAi = E_Bi = 0. Then 



^XijYit^ — 



EXuYu - EAf 
EXuYu = (EA2)2 
EX,:, 





^tjYij — (EAi) 



if i) holds 
if ii) holds 
if Hi) holds 
otherwise 



(25) 



where above conditions i)-iii) are as follows 

i) i = j = £ ~ uj and I < i < c. 

ii) i = j, £ = UJ and i ^ £ and 1 < i, £ < k. 

iii) i =/= j and = and 1 < i, j < c. 



E{Tr(Ce''i^)Tr(De"-^)} < Tr(C)Tr(D) {T,(e'*i - l)(e''^ - 1) + rp,,ax(e''^ - 1) + Tp,„,,(e"^ - 1) + l} , 
E{Tr(Ce''i^)Tr(De''^^)} < Tr(C)Tr(D) {Tqie^'^' - l)(e-''^ - 1) + Tp^^U^-"' - 1) + rp,„,„(e-''^ - 1) + l} , (24) 
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Also for any positive semidefinite matrices C, D of size kx k, 
we have the following inequality 

ETr(CX)Tr(Dr) < max (eA^, (EA^)^^ 

+ 2{EAlf] •Tr(C)Tr(D). 

Proof: The RVs Ai, A2, ■ ■ ■ , Ak, B,+i, B,+2, ■ ■ ■ , Bk 
are IID, and Ai = Bi for i < c. Because Xij — AiAj 
and Yij = BiBj, then XijYi^ — AiAjB^B^. Assume there 
exists at least one index (say i) that does not equal any of 
the other indices (say j,£,uj), then EXijYi^ = (in this case 
EAiAjBeB^ ^ {EA.,){EAjBiB^) = by our independence 
assumption, and the assumption {EAi) = 0. That is under 
our assumptions, the only cases whereby EXijYi^ 7^ are 
outlined in jZSl ). 

Let Cij and dij denote the matrix entries of C and D, 
respectively. We get that 

ETr(CX)Tr(DF) 

k k 

= ^ ^ CijE{XijYi^)deui 
i,j=i e,u=i 

k c 

= J2 cuE{XuYu)du + 2 ^ E(Xyry)dy. (26) 



i,£=l 



By assumed positive definiteness of C and D, we have 
Cj, > and d,^ > 0. Also EX^Yu > for all i,i > 0, 
see $25[ . Hence the first term of (|26] | is upper bounded by 
{ma,x^ EXiiYu) •Tr(C)Tr(D). By our assumptions for all 
i < i where i, j < c, we have EXijYij = (EA^)^. The second 
term of ^ is upper bounded by 2(EA^)2 • Tr(CD) (this 
upper estimate independent of constant c), where Tr(CD) < 
Tr(C)Tr(D) by positive semidefiniteness of C and D. ■ 
We now verify the claim Tp = in (fTSI l. when 

the columns j4, are independent. Take X = y^Ag and 
Y — y^A-jz as in Proposition [1] with bounded IID entries 
| < 1/fc and iFy l < 1/k. Use these X and Y in 
Proposition |2] to conclude that we can choose Tq as Tq = 
max(EXi2^, (E;^ii)2) + 2 • (EXn)'^, which must be of the 
form since \Xii\ < 1/k. Finally we comment with 

independent columns, the condition max(rp_niax, Tp.tnin) < y/i^ 
in Proposition [T] is easily satisfied. Recall this condition 
is equivalently f3' < ^/]3 see ( fTSl ). and simply take Tq = 

B. Mutual coherence case 

Next to emphasize generality of the Poisson approximation 
Theorem 2.N, we demonstrate a different application. In 
some early seminal work before the introduction of restricted 
isometry-type analyses, a different CS parameter was consid- 
ered. Let $ denote a matrix with n columns <j)i, that satisfies 
the normalization ||<^i||2 = 1- The mutual coherence (or 
simply coherence) of such a matrix $ is measured by the 
following quantity 



max \(t>J(t>j\ 



l<'i,j <n 



(27) 



By definition the mutual coherence is a number a in IR between 
and 1. CS recovery guarantees are obtainable from knowl- 
edge of (I27] ), see e.g., |[3|-||5l, ||30l , whereby the guarantees 
get stronger if the coherence gets smaller We can relate the 
coherence to restricted isometry using the Gershorgin circle 
theorem. Let ( equal the function Q. As mentioned in lO, 
p. 2, for a matrix A with k number of columns, all unit- 
norm, we have C{A) < {k — 1) ■ a where a equals the mutual 
coherence dZTb of A, see ID, p. 2. However the coherence of 

an m X n matrix cannot be very small; it is at least \J min™!^ ' 
see [,51. Many techniques e.g., Q, E), US, |l3l] involve the 
mutual coherence, thus also for the sake demonstrating the 
utility of U-statistic theory, we devote this small subsection 
to Poisson approximation of mutual coherence. While 1271 
recently considered a more complicated analysis for a more 
complicated setting, the exposition here is original, simplified, 
and framed in the context of CS. Here we only consider size- 2 
subsets S. Define the kernel C : R™^^ — > R as 

C(A) = 



T 



(28) 

|aill2 • ||a2| 

where A has two columns ai and a2. Here (unlike the 
restricted isometry case) we make effort to normalize porperly. 
For an m X 71 matrix the statistic max^ (i^s) equals the 
mutual coherence ( |27] ) of Let g be the indicator kernel 
satisfying g{A, a) — 1 {((A) > a}. Then the corresponding 
U-statistic Un{a) with previously defined indicator kernel g, 
is related to the mutual coherence because {t/„(a) = 0} = 
{max5C(^s) ^ o,}- The mutual coherence is also a "worst- 
case" statistic, similar to the restricted isometrics, and so 
the concept of Poisson approximation applies similarly. To 
apply Theorem 2.N, we require estimates for p{a) and qi{a), 
whereby in this case |5| —2 so we only have qi{a). 

Both p{a) and qi (a) are similarly estimated. Let A be an 
m X n random matrix, and assume its columns Ai to be IID. 
Denote the probability /(a,b) as 

T 

b 



/(a,b)-Pr 



11^ 



12 



> a 



(29) 



Let S ~ {«i,i2} and TZ — {*2,*3} whereby S CiTZ — {12}, 
and by conditioning on Ai^ we have 



= E/ a, 



p{a) = Pi'iCiAs) > a} 



q,{a) = Pr{C($5) > a, C(*k) > a} = Ef ( a, 



(30) 

The following proposition provides an exponential bound for 
/(a, b) in ( |29] l. Here, b is any vector in R™ whereby ||b||2 = 
1. We prove the following result under for both Gaussian and 
Bernoulli matrices, due to slight complications introduced by 
the normalization in 



Proposition 3. Let Ai be a length-m random vector with 
IID entries with zero mean, whereby each entry is either 
Gaussian with variance 1/m, or Bernoulli {—l/^/m,y^}. 
The probability /(a,b) in i[29[ , for ||b||2 = 1, is upper 
bounded as f{a, b) < 2 exp(— m • a^/2). 
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We defer the proof for a moment. Use Proposition [3] in 
( [30] l, whereby substituting Ai.-,/||Ai2 1|2 = b, we get that 
p{a) < 2exp(— TO • and (71(a) < 4exp(— m • a^). 

Thus the exponent of (71(a) is twice as large as that of p{a), 
which suggests small Poisson approximation error e„(a). Here 
(unlike the restricted isometries case) we do not use ( fTTT i. but 
instead use the other bound (|32] l in the appendix. We can verify 
the following result by setting k — 2, and further bounding 



(n-2\ 
V 2 



<2n-3and © ( V) (2) < 



Theorem 2. Let A be an m x n random matrix, whereby 
the columns Ai are IID and either Gaussian or Bernoulli 
distributed as described in previous Proposition \3\ Denote 
\i{o) — {n'{'n—l)/2)-p{a). Then, the Poisson approximation 
error e„(a) is upper bounded as 



f-n{a) < (1 - e" 



-A„(a)N 



(4n — 6) • exp 



ma 



2 \ 1 



+ {4n^) ■ exp(-TOa^). 



(31) 



Theorem |2] indicates that the mutual coherence is well 
predicted union bounds. By p{a) < 2exp(— m • a^/2), we 
have A„(a) at most 71^ • exp(— ma^/2), and A„(z) drops 
exponentially in to if a > •\/ (4 log n)/m - this is a standard 
estimat^ see Q. As before we are mostly concernted about 
the the second term in (ISTT i. which requires a weaker condition 
on a to drop exponentially. More specifically we only need 
a > ^ (log 4 + 3 log n) / m (weaker than previous condition 
as long as n > 4). To conclude this subsection, we show the 
proof of Proposition [3] 

Proof of Proposition |5} For notational simplicity let 
X = A1/IIA1II2. We will show Pr{X^b > a} < exp(-TO • 



a^/2), the other case Pr{X b < -a} < exp(- 



«72) 



follows by symmetry of the distribution of X. First consider 
the case where the entries of Ai is Gaussian distributed, 
then X is uniformly distributed on the surface of an to- 
dimensional hypersphere. For any m x m orthogonal matrix 
C, i.e. C^C — I, then X^C has the same distribution as X. 
So choose any C e IR™^" such that Cb = [1,0- •• ,0]"^ 
then Pr{X'^b > a} = Pt{Xi > a} since ||b||2 = 1. 
Since ||X||2 = 1, then Pr{Xi > a} is proportional to the 
surface area of the spherical cap {xi > a : ||x||2 = 1}. This 
probability is upper bounded by (1 — a^)^^, see [32], p. XIII-3, 
which is in turn upper bounded by exp(— to • a^/2). 

In the case where the entries of Ai Bernoulli distributed, 
then ||v4i||2 — 1 and every entry Xi of X is independent. 
For sums Sm = X^I^i ^« °f independent RVs Yi with 
< c„ see I33l eqn. (2.6), we have Prj^S*™ > a} < 
exp(TO^a^/(2||c||2)^) whereby c = [ci,C2-- - ,€,„]. Setting 
X^h — -^Sm, we have \Yi\ < ^frn ■ hi, and setting 
Ci = ^Jm-hi we have ||c||2 = to - ||b||2 = to, since ||b||2 = 1. 
Thus, Pr{X^b > a} < exp(-TO • aV2). ■ 

For the Bernoulli case. Figure [3] shows some empirical 
evidence that supports the theory derived in this subsection. 
Here we consider moderate measurement size m = 50, and 
four different block lengths n of 100,200,500 and 1000. 



*In (71, this estimate is given as y'(2 log n)/m however we believe that 
\/2 factor has been omitted. 



Corresponding to these values for n. Figures |3ja) — (d) 
plots the empirical tail probability of the mutual coherence, 
see ( I27] ). We also plot the function 1 — exp(— A„(a)) where 
the marginal quantity p{a) in A„(a) is taken to be p{a) = 
2 exp(— TOa^/2)/(aA/27r), and we expect this function to be 
close to the distribution of the mutual coherence. This is due 
to the following two reasons. First from (l28T l we see for the 
Bernoulli case C,(A) = |A|^j42|, and A^A2 is a sum of to IID 
Bernoulli {— 1/to, 1/to} variables - which is approximately 
Gaussian distributed with variance 1/to. Second, by Theorem 
2.N and Theorem |2] we expect the mutual coherence to have 
good Poisson distribution. Indeed we observe for all cases of 
n shown, that the empirical distribution is close to the plotted 
1 — exp(— A„(a)), i.e., the union bound is tight. We also plot 
the error e„(a) in (ISTT i; the first and second terms are plotted 
separately. We observe that for values of a greater than the 
standard estimate ■\/4 \ogn/m, the error values e„(a) become 
insignificantly small for all cases of n shown. 

V. Conclusion 

This paper takes a first look at U-statistical theory for 
predicting the "worst-case" behavior of salient CS matrix 
parameters. We showed how U-statistical theory is able to 
provide theoretical bounds on the tightness of union bounds 
analyses, whereby such results have never been investigated 
before in the CS context. We investigated this premise for 
two important CS parameters: i) restricted isometries and ii) 
mutual coherence. Our two main theorems determine that 
union bounds are tight, whereby for i) when to = 0(^^(1 + 
log(n/fc))) the restricted isometry constants need to grow 
linearly with sparsity k, and for ii) the mutual coherence 
is of the standard estimate ^ (4 log n) / to. That is under the 
specified conditions, the above two theorems justify the use 
of simple union bounds for "worst-case" analysis. 

We discuss some directions for future work. Firstly, it would 
be also desirable to improve the analyses in Subsection IIV-AI 
to allow the same conclusion for i) above but having the 
restricted isometry constants not depend on k. Secondly, it 
would be interesting to consider application of the techniques 
here to the null- space property, from which powerful recovery 
guarantees can be obtained. Thirdly, one might investigate the 
same tightness of union bound analyses for the case when the 
sampling matrix columns are dependent, whereby this requires 
appropriate extensions of Theorem 2.N. 

Appendix 

A. Derivation of error estimate ( liil ) 

Here we derive (dB from ([Toll. For some a such that 
p{a) > 0, note from Q that (^') •^^(a) = A„(a)p(a)^^g,j(a) > 



(1 



-A„(a) 



)p{a) ^qi(a), the inequality follows because 



1 — e " < a for all o; > 0. Hence we can upper estimate 
the approximation error e„(a) given in (fTOl i as follows 



.(«) < (1 



-A„(a)N 



+ 



fe-1 

E 

r=l 



p{a) 

n — k 
k — r 



qr{a).). 



(32) 
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Fig. 3. Bernoulli measure. For the mutual coherence case, compaiing empirical tail probability with the "union bound" predicted by 1 — cxp(— A„{a)) 
(obtained using Gaussian approximation, see text). The eiTor terms refer to )31t . 



We use a fact from |26|, see Lemma 1, that for i < k — 1 
the inequality qk-i{a) > qi{a) holds. By also using OlZi) > 
for alH < A: - 1, we claim 



k-l 

E 



k 

r 
< 



n — k 
k — r 



qr{a) 



■fc-i 

E 

.r=l 



n — k 
k-1 



< 2" 



< 2* 



1 



2k 

k~l 
e{2k - 1) 
k - 1 



n 



2k- 

k- 



Qk-lia) 
9fe-i(a), 



2k - 1 



2k~l 



qk-i{a). 



The second-last inequaUty follows from the identities 



e) - 2'= - 1 and ("-'=) G) 



(k+i 



) {k+i) ■ 



B. Technical proofs of claims appearing in Subsection \IV-A\ 

Proof of Lemma |7} Put B = I + (e'* - l)X - eJ'^ . 
By the Hnearity of Tr(-), we have Tr(CB) = Tr(C) + 
(e'' - l)Tr(CX) - Tr(Ce''^). Also since X is positive 
semidefinite, ( fT9] l states that B is positive semidefinite. For 
any two positive semidefinite matrices C and B, we have 
Tr(CB) > 0, therefore Tr(Ce''^) < Tr(C) + (e''-l)Tr(CX). 
Take expectations of both sides. Finally because e'* — 1 > 0, 
use Tr(CEJf) < Tr(C) ■ c^,^.^,^{EJC) to prove the first inequality 

of (Hall. 

To show the second inequality, put B = I+ (e^'' — 1)X — 
g-hx gy dYgj i jjjjg matrix B is still positive semidefinite. 



The rest of the arguments follow similarly as the first case, 
however note that in this case e^'* — 1 < therefore we use 
Tr(CEJf ) > Tr(C)-<jinin(IEJC) to finish the proof for the second 
inequality of (l22l i. ■ 
Proof of Theorem C: In this proof, we set Sm — 
J2iLi-^i' where Xi is the i-th row outer sum of ^-As, 
or simply A^Ag — -^^Sm- By the assumption \Aij\ < 1, 
then < 1. By dinil we have Prlwl-S™) > t} < 



e ETr(e '"). First we want to show 

ETr(e''^'") < fc (eVp,„ax + 1 - r, 



p,max J 



(33) 



Use the Golden-Thompson inequality to write ETr(e < 
ETr(e'*^'"-ie'*"^'"). For now use the notation shortcut t = 
Because Xm is positive semidefinite and satisfies 
„) < 1, use Lemma [T] (for e'*^'""^ in place of 



'p.max- 
^max {X 



C) to obtain ETr(e 



hS„ 



< 



ETr(e 



hS„ 



(e''r+ (1 -r)). 



Repeat the argument to — 1 more times for Sm-i,Sm-2, ■ ■ ■ 
where we finally get ETr(e'''^"') < Tr(I) (e^r + 1 — r) and 
Tr(I) = fc, showing ( l33T l. Putting previous facts together, for 
t > we have the bound 



Pr{<;max(S',„) > TO • (r + t)} < ke 



-hm{T+t) ( h 



(eV + l-r) 



Optimize the bound by setting re'' = (T+t)(l — r)/(l — t — t), 
see (4.7) in [^33l, where we enforce t + 1 < 1 to guarantee a 
positive solution for h. Using some manuipulations to express 
Prlwl^rn) > TO-(r+t)} < A:e^(^+*ll^), where X'(-ll-) is the 
binary information divergence. Finally equate Y'^{a^^^{As) > 
a} = Pr{w(5„0 > ^} andPr{w(5„) > TO-(r+t)} we 
set t = a/k ~ T, and we proved the first inequality, where the 
limits on a follow from r + t = a/k < 1 and t = a/k — r > 0. 
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The second inequality is shown very similarly. For the rest 
of the proof, use the notation shortcut r = Tp. min- Starting 
from the equation below ( |20b . repeating similar arguments we 
can show 

Pr{w(-5„) > -TO-(T+t)} < fce''"(^+*) (e-V + 1 - r)™ 

which is optimized by setting re^'' = (t+<)(1 — r)/(l — r— t) 
to get Pr{w(-S'„) > -m • (t + t)} < fce^(^+*ll^), where 
we enforce t + t > to guarantee a positive solution for 
h. Equating Pr{aUAs) < a} - Pr{w(-5„0 > -^f} 
and Pr{<jmax('S'„i) > —'m ■ {t + t)} we set t — a/k — t to 
prove the second inequality, where the limits on a follow from 
T + t = a/k > and t = a/k - r < 0. ■ 
Proof of Proposition Q} We continue where we left off 
from (|23] |. Apply the Golden-Thomson inequality on the term 
E{Tr(p>i^'")Tr(e''2^'")} to get 

E{Tr(e''i^")Tr(e''^^'")} 

< E{Tr(e''i^'"-i e''i^" )Tr(e''^^"-i e''^'^'" )}. 

Apply the first inequality in jTM . followed by the Golden- 
Thomson inequality and then ( |24] | again and so on, to show 
(using some further algebraic manipulations) that for t > 

Pr {<rmax('S'm) > m {t + t) , ?,nax(r,„) > m {t + t)} 

< A:2e-'"(-+*)(''i+''2) {c2(cie''i + 1 - ci)(cie''^ + 1 - ci) 
+1 - 02}" (34) 

where the constants ci = r^/rp^max and C2 = T^^^^^/rq, and 
we used the shorthand r = Tp max- Differentiating the exponent 
of (|34] | with respect to both /ii and /12 we get respectively 

(cie'*^ + 1 - ci) (cie''i (1 - r - - (1 - Ci) (r + t)) 

= (c2-i-l)(r + 0, 
(cie'^i + 1 - ci) (cie'^^ {l-T~t)~{l~ ci) (r + i)) 

-(c2-i-l)(T + i). 

To solve the previous two equations it suffices to have h = 
hi = /i2- Then by substituting a — cie^ we solve the quadratic 
equation /(a) — a'^ + ba + c where b — {1 — ci)(l — 2(t + 
i))/(l-r-t) and c = -{T+t)[(l-ci) + {c:^^ -1)]/{1-T^t). 
Under the assumption C2 < 1, the solution ( |35] | (see page 
bottom) for a = cie'' will exist for some positive h > 0, if we 
constraint r + i < 1. To see this, check that for < t < 1 — r 
the RHS above increases monotonically as t increases, and 
verif>0 that if we set < = the RHS of equals ci (in 

'Alternatively, it might be easier to verify that when t = 0, both of the 
two equations displayed above are satisfied when we set hi = h2 = 0. 



which case h = 0). We then write (l36T l (see page bottom) 
where C4 = 04(0, fc, ci, C2) is given as in (flSl l after equating 
a/k — T + t. Substituting (|36] | back into (l34l i and by further 
algebraic manipulations we get 

Pr {^max(S',„) >m{T + t) (T™) > m(T + 0} 

< ;j2g-m-2CD(r+t||ci)+C3(t,fc,ci,C2)) 

where C3 = 03(0,^,01,02) is given in (fT4l i after equating 
a/fc = T + t, and we get the desired result. The limits on 
a follow as before: t + t = a/k < 1 and t — a/k — t > 0. 

For the other case we have ci — r^/rp min and C2 = 
Tp^^^/Tq < 1, and we similarly show 

Pr{<;max(-S'm) > -TO (t + t) , "JmaxC-rm) > -TO (t + t)} 

<fe2e™(^+*)(''i+''2) {c2(cie^''i + 1 - ci)(cie-'^^ + 1 - ci) 
+ 1 - C2} 

where we use the shorthand r = Tp min- Again as in the second 
part of the proof of Theorem C, we now constrain t + t > 
and proceed similarly as before. Now treating e^'' instead 
of e^, the expression for cie^^ simply equals the RHS of 
the equation above (l36T l. whereby the said RHS decreases 
monotonically as t decreases in the range — r < t < 0. 
Hence we conclude as before that cie"'' equals the RHS of 
(|36] | for some positive h, and the essentially same expressions 
follow. The limits on a follow from t + t = a/k > and 
t = a/k-T<0. m 
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